Preference Elicitation and Inverse Reinforcement Learning
نویسندگان
چکیده
We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relation of the resulting approach to other statistical methods for inverse reinforcement learning via analysis and experimental results. We show that preferences can be determined accurately, even if the observed agent’s policy is sub-optimal with respect to its own preferences. In that case, significantly improved policies with respect to the agent’s preferences are obtained, compared to both other methods and to the performance of the demonstrated policy.
منابع مشابه
Preference elicitation and inverse reinforcement learning
We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relati...
متن کاملBayesian Inverse Reinforcement Learning
Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an expert. IRL is motivated by situations where knowledge of the rewards is a goal by itself (as in preference elicitation) and by the task of apprenticeship learning (learning policies from an expert). In this paper we sh...
متن کاملProbabilistic and Decision-Theoretic User Modeling in the Context of Software Customization
Research in the field of user modeling has aimed to supersede the current “one-size-fits-all” trend in software development that forces users to change their behaviour according to the preprogrammed functions. This paper discusses aspects of user modeling and its relevance to software customization. In particular, we focus on user modeling techniques that utilize probabilistic and decision-theo...
متن کاملBayesian Multitask Inverse Reinforcement Learning
We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the ...
متن کاملEnabling Environment Design via Active Indirect Elicitation
Many situations arise in which an interested party wishes to affect the decisions of an agent; e.g., a teacher that seeks to promote particular study habits, a Web 2.0 site that seeks to encourage users to contribute content, or an online retailer that seeks to encourage consumers to write reviews. In the problem of environment design, one assumes an interested party who is able to alter limite...
متن کامل